More than Bag-of-Words: Sentence-based Document Representation for Sentiment Analysis
نویسندگان
چکیده
Most sentiment analysis approaches rely on machine-learning techniques, using a bag-of-words (BoW) document representation as their basis. In this paper, we examine whether a more fine-grained representation of documents as sequences of emotionally-annotated sentences can increase document classification accuracy. Experiments conducted on a sentence and document level annotated corpus show that the proposed solution, combined with BoW features, offers an increase in classification accuracy.
منابع مشابه
Emotion Classification of Chinese Microblog Text via Fusion of BoW and eVector Feature Representations
Sentiment Analysis has been a hot research topic in recent years. Emotion classification is more detailed sentiment analysis which cares about more than the polarity of sentiment. In this paper, we present our system of emotion analysis for the Sina Weibo texts on both the document and sentence level, which detects whether a text is sentimental and further decides which emotion classes it conve...
متن کاملEvaluation of a General-Purpose Sentiment Lexicon on a Product Review Corpus
This paper introduces a new general-purpose sentiment lexicon called the WKWSCI Sentiment Lexicon and compares it with three existing lexicons. The WKWSCI Sentiment Lexicon is based on the 6of12dict lexicon, and currently covers adjectives, adverbs and verbs. The words were manually coded with a value on a 7-point sentiment strength scale. The effectiveness of the four sentiment lexicons for se...
متن کاملThe Role of Knowledge-based Features in Polarity Classification at Sentence Level
Though polarity classification has been extensively explored at document level, there has been little work investigating feature design at sentence level. Due to the small number of words within a sentence, polarity classification at sentence level differs substantially from document-level classification in that resulting bag-of-words feature vectors tend to be very sparse resulting in a lower ...
متن کاملCompact Features for Sentiment Analysis
This work examines a novel method of developing features to use for machine learning of sentiment analysis and related tasks. This task is frequently approached using a “Bag of Words” representation – one feature for each word encountered in the training data – which can easily involve thousands of features. This paper describes a set of compact features developed by learning scores for words, ...
متن کاملRich Document Representation for Document Clustering
In traditional document clustering models, a document is considered as a bag of words. In this paper we present a new method for generating feature vectors, using the sentence fragments that are called logical terms and statements, in PLIR system. PLIR is a Knowledge-Based Information system based on the theory of the Plausible Reasoning. We have conducted a number of experiments using OHSUMED ...
متن کامل